6 research outputs found
LayoutLM: Pre-training of Text and Layout for Document Image Understanding
Pre-training techniques have been verified successfully in a variety of NLP
tasks in recent years. Despite the widespread use of pre-training models for
NLP applications, they almost exclusively focus on text-level manipulation,
while neglecting layout and style information that is vital for document image
understanding. In this paper, we propose the \textbf{LayoutLM} to jointly model
interactions between text and layout information across scanned document
images, which is beneficial for a great number of real-world document image
understanding tasks such as information extraction from scanned documents.
Furthermore, we also leverage image features to incorporate words' visual
information into LayoutLM. To the best of our knowledge, this is the first time
that text and layout are jointly learned in a single framework for
document-level pre-training. It achieves new state-of-the-art results in
several downstream tasks, including form understanding (from 70.72 to 79.27),
receipt understanding (from 94.02 to 95.24) and document image classification
(from 93.07 to 94.42). The code and pre-trained LayoutLM models are publicly
available at \url{https://aka.ms/layoutlm}.Comment: KDD 202
Predicting users' first impressions of website aesthetics with a quantification of perceived visual complexity and colorfulness
Users make lasting judgments about a website's appeal within a split second of seeing it for the first time. This first impression is influential enough to later affect their opinions of a site's usability and trustworthiness. In this paper, we demonstrate a means to predict the initial impression of aesthetics based on perceptual models of a website's colorfulness and visual complexity. In an online study, we collected ratings of colorfulness, visual complexity, and visual appeal of a set of 450 websites from 548 volunteers. Based on these data, we developed computational models that accurately measure the perceived visual complexity and colorfulness of website screenshots. In combination with demographic variables such as a user's education level and age, these models explain approximately half of the variance in the ratings of aesthetic appeal given after viewing a website for 500ms only.Engineering and Applied Science